{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Shap-E"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Shap-E is a conditional model for generating 3D assets which could be used for video game development, interior design, and architecture. It is trained on a large dataset of 3D assets, and post-processed to render more views of each object and produce 16K instead of 4K point clouds. The Shap-E model is trained in two steps:\n",
    "\n",
    "1. an encoder accepts the point clouds and rendered views of a 3D asset and outputs the parameters of implicit functions that represent the asset\n",
    "2. a diffusion model is trained on the latents produced by the encoder to generate either neural radiance fields (NeRFs) or a textured 3D mesh, making it easier to render and use the 3D asset in downstream applications\n",
    "\n",
    "This guide will show you how to use Shap-E to start generating your own 3D assets!\n",
    "\n",
    "Before you begin, make sure you have the following libraries installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# uncomment to install the necessary libraries in Colab\n",
    "#!pip install -q diffusers transformers accelerate trimesh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Text-to-3D"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To generate a gif of a 3D object, pass a text prompt to the [ShapEPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e#diffusers.ShapEPipeline). The pipeline generates a list of image frames which are used to create the 3D object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from diffusers import ShapEPipeline\n",
    "\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "\n",
    "pipe = ShapEPipeline.from_pretrained(\"openai/shap-e\", torch_dtype=torch.float16, variant=\"fp16\")\n",
    "pipe = pipe.to(device)\n",
    "\n",
    "guidance_scale = 15.0\n",
    "prompt = [\"A firecracker\", \"A birthday cupcake\"]\n",
    "\n",
    "images = pipe(\n",
    "    prompt,\n",
    "    guidance_scale=guidance_scale,\n",
    "    num_inference_steps=64,\n",
    "    frame_size=256,\n",
    ").images"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "이제 [export_to_gif()](https://huggingface.co/docs/diffusers/main/en/api/utilities#diffusers.utils.export_to_gif) 함수를 사용해 이미지 프레임 리스트를 3D 오브젝트의 gif로 변환합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from diffusers.utils import export_to_gif\n",
    "\n",
    "export_to_gif(images[0], \"firecracker_3d.gif\")\n",
    "export_to_gif(images[1], \"cake_3d.gif\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"flex gap-4\">\n",
    "  <div>\n",
    "    <img class=\"rounded-xl\" src=\"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif\"/>\n",
    "    <figcaption class=\"mt-2 text-center text-sm text-gray-500\">prompt = \"A firecracker\"</figcaption>\n",
    "  </div>\n",
    "  <div>\n",
    "    <img class=\"rounded-xl\" src=\"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif\"/>\n",
    "    <figcaption class=\"mt-2 text-center text-sm text-gray-500\">prompt = \"A birthday cupcake\"</figcaption>\n",
    "  </div>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Image-to-3D"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To generate a 3D object from another image, use the [ShapEImg2ImgPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e#diffusers.ShapEImg2ImgPipeline). You can use an existing image or generate an entirely new one. Let's use the [Kandinsky 2.1](https://huggingface.co/docs/diffusers/main/en/using-diffusers/../api/pipelines/kandinsky) model to generate a new image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from diffusers import DiffusionPipeline\n",
    "import torch\n",
    "\n",
    "prior_pipeline = DiffusionPipeline.from_pretrained(\"kandinsky-community/kandinsky-2-1-prior\", torch_dtype=torch.float16, use_safetensors=True).to(\"cuda\")\n",
    "pipeline = DiffusionPipeline.from_pretrained(\"kandinsky-community/kandinsky-2-1\", torch_dtype=torch.float16, use_safetensors=True).to(\"cuda\")\n",
    "\n",
    "prompt = \"A cheeseburger, white background\"\n",
    "\n",
    "image_embeds, negative_image_embeds = prior_pipeline(prompt, guidance_scale=1.0).to_tuple()\n",
    "image = pipeline(\n",
    "    prompt,\n",
    "    image_embeds=image_embeds,\n",
    "    negative_image_embeds=negative_image_embeds,\n",
    ").images[0]\n",
    "\n",
    "image.save(\"burger.png\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pass the cheeseburger to the [ShapEImg2ImgPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e#diffusers.ShapEImg2ImgPipeline) to generate a 3D representation of it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from PIL import Image\n",
    "from diffusers import ShapEImg2ImgPipeline\n",
    "from diffusers.utils import export_to_gif\n",
    "\n",
    "pipe = ShapEImg2ImgPipeline.from_pretrained(\"openai/shap-e-img2img\", torch_dtype=torch.float16, variant=\"fp16\").to(\"cuda\")\n",
    "\n",
    "guidance_scale = 3.0\n",
    "image = Image.open(\"burger.png\").resize((256, 256))\n",
    "\n",
    "images = pipe(\n",
    "    image,\n",
    "    guidance_scale=guidance_scale,\n",
    "    num_inference_steps=64,\n",
    "    frame_size=256,\n",
    ").images\n",
    "\n",
    "gif_path = export_to_gif(images[0], \"burger_3d.gif\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"flex gap-4\">\n",
    "  <div>\n",
    "    <img class=\"rounded-xl\" src=\"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png\"/>\n",
    "    <figcaption class=\"mt-2 text-center text-sm text-gray-500\">cheeseburger</figcaption>\n",
    "  </div>\n",
    "  <div>\n",
    "    <img class=\"rounded-xl\" src=\"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif\"/>\n",
    "    <figcaption class=\"mt-2 text-center text-sm text-gray-500\">3D cheeseburger</figcaption>\n",
    "  </div>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate mesh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Shap-E is a flexible model that can also generate textured mesh outputs to be rendered for downstream applications. In this example, you'll convert the output into a `glb` file because the 🤗 Datasets library supports mesh visualization of `glb` files which can be rendered by the [Dataset viewer](https://huggingface.co/docs/hub/datasets-viewer#dataset-preview).\n",
    "\n",
    "You can generate mesh outputs for both the [ShapEPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e#diffusers.ShapEPipeline) and [ShapEImg2ImgPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/shap_e#diffusers.ShapEImg2ImgPipeline) by specifying the `output_type` parameter as `\"mesh\"`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from diffusers import ShapEPipeline\n",
    "\n",
    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "\n",
    "pipe = ShapEPipeline.from_pretrained(\"openai/shap-e\", torch_dtype=torch.float16, variant=\"fp16\")\n",
    "pipe = pipe.to(device)\n",
    "\n",
    "guidance_scale = 15.0\n",
    "prompt = \"A birthday cupcake\"\n",
    "\n",
    "images = pipe(prompt, guidance_scale=guidance_scale, num_inference_steps=64, frame_size=256, output_type=\"mesh\").images"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `export_to_ply()` function to save the mesh output as a `ply` file:\n",
    "\n",
    "> [!TIP]\n",
    "> You can optionally save the mesh output as an `obj` file with the `export_to_obj()` function. The ability to save the mesh output in a variety of formats makes it more flexible for downstream usage!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from diffusers.utils import export_to_ply\n",
    "\n",
    "ply_path = export_to_ply(images[0], \"3d_cake.ply\")\n",
    "print(f\"Saved to folder: {ply_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then you can convert the `ply` file to a `glb` file with the trimesh library:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import trimesh\n",
    "\n",
    "mesh = trimesh.load(\"3d_cake.ply\")\n",
    "mesh_export = mesh.export(\"3d_cake.glb\", file_type=\"glb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the mesh output is focused from the bottom viewpoint but you can change the default viewpoint by applying a rotation transform:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import trimesh\n",
    "import numpy as np\n",
    "\n",
    "mesh = trimesh.load(\"3d_cake.ply\")\n",
    "rot = trimesh.transformations.rotation_matrix(-np.pi / 2, [1, 0, 0])\n",
    "mesh = mesh.apply_transform(rot)\n",
    "mesh_export = mesh.export(\"3d_cake.glb\", file_type=\"glb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Upload the mesh file to your dataset repository to visualize it with the Dataset viewer!\n",
    "\n",
    "<div class=\"flex justify-center\">\n",
    "    <img class=\"rounded-xl\" src=\"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/3D-cake.gif\"/>\n",
    "</div>"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}
